make sure code and outputs render make sure warnings and messages are suppressed
1. Which option do you plan to pursue?
I plan to pursue option 2 and build a cohesive infographic-style visualization. This will include three different, but complementary visualizations to tell a story with my Strava data.
2.Restate your questions(s). Has this changed at all since HW # 1? If yes, how so?
I originally had two datasets I was planning to explore. With my Strava data now working, I’ve chosen to pursue that dataset and change my question. My main over-arching question will be: How should I prepare for my next marathon? Utilizing my Strava data, I’ll be able to plot my past marathon training habits/results by pulling activities related to my Strava account.
My sub-questions for each part of the infographic include: How should I ramp up my training by each week and month leading up to the race? What other types of activities should I be including in cross-training? And, what are the average duration and time I should expect my runs to be over the training period?
3.Explain which variables from your dataset you will use to answer your question(s).
I have one dataset from the Strava API and I will use the {rStrava} package to access this data successfully in R. After using the API to receive my data, I have been able to read-in the following variables related to my Strava activities: id - unique identifier for the activity name - named title of the activity sport_type - type of activity start_date - the start date and time of the activity (in ymdhms format) year - year of the start date of the activity month - month of the start date of the activity day - day of the start date of the activity week_number - the week number in the year total_miles - the total miles completed during the activity (if applicable) elevation_gain_ft - the total elevation gain during the activity in feet (if applicable) avg_speed_mi_hr - total average speed of activity in miles per hour (if applicable)
Note: I have only listed the most relevant variables for my visualizations, as there are 49 potential variables to utilize in the dataset. With these variables, I can graph many different aspects of my Strava data.
Credits to Sam Csik for getting me going with this dataset, by providing clear code in GitHub, allowing me to wrangle the Strava data into a more accessible format.
4. Find at least two data visualizations that you could (potentially) borrow / adapt pieces from. Link to them or download and embed them into your .qmd file, and explain which elements you might borrow (e.g. the graphic form, legend design, layout, etc.).
Data Visualization Example #1:
The elements I plan to borrow from this graph are the faceting (though I might do by months) and the heatmap of calendar days. I like this display of the data because in a shorter timeframe (6 months of training) it’ll be easier to see the data and aligns with what a typical training plan for running looks like.
Data Visualization Example #2:
Violin Example of Kilometers Ran
In this visualization, I might borrow the violin plot idea by month to show the distribution of miles that occurs over the months of training. This might be a nice way to show the distributions in a clearer way, especially since distance will increase as the months get closer to the marathon date.
5.Hand-draw your anticipated three visualization infographic (option 2). Take a photo of your drawing and embed it in your rendered .qmd file – note that these are not exploratory visualizations, but rather your plan for your final visualizations that you will eventually polish and submit with HW #4.
Infographic Sketch for HW 4
This is an example of what my infographic will look like. I haven’t decided on the third section. In this mock-up, I will have two smaller “fun” plots like a spider plot / radial treemap. My alternative approach would be to have one final graph that is similar to a bar plot to compare over training weeks in a nother view. I am hoping for feedback on which type of plot might make the most sense to answer my larger question, because I am grappling with trying to balance creating simple and clear plots versus advanced and abstract to answer some of my questions.
6.Mock up your visualizations using code. We understand that you will continue to iterate on these into HW #4 (particularly after receiving feedback), but by the end of HW #3, you should: -use appropriate strategies to highlight / focus attention on a clear message -include appropriate text such as titles, captions, axis labels -experiment with colors and typefaces / fonts -create a presentable / aesthetically-pleasing theme (e.g. (re)move gridlines / legends as appropriate, adjust font sizes, etc.)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.3 ✔ readr 2.1.4
✔ forcats 1.0.0 ✔ stringr 1.5.0
✔ ggplot2 3.4.4 ✔ tibble 3.2.1
✔ lubridate 1.9.2 ✔ tidyr 1.3.0
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dplyr)library(paletteer) #for color palettes
#Reading in Data ---strava_activities <-read_csv("strava_activities.csv")
New names:
Rows: 585 Columns: 48
── Column specification
──────────────────────────────────────────────────────── Delimiter: "," chr
(10): name, visibility, sport_type, sport_type_alt, timezone, moving_ti... dbl
(33): ...1, id, year, month, day, hour, minute, second, week_number, at... lgl
(2): has_heartrate, private dttm (2): start_date_time_local, start_date date
(1): start_date_local
ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
Specify the column types or set `show_col_types = FALSE` to quiet this message.
• `` -> `...1`
#Wrangling Data ---##creating a month name column strava_activities$month_name <- month.name[strava_activities$month]##filtering to just timeframe of marathon training filtered_strava_activities <- strava_activities %>%filter(year ==2023& month >4& month <11)
#Creating heatmap ---ggplot(filtered_strava_activities, aes(x = training_week, y = day)) +geom_tile(aes(fill = total_miles)) +#specifying tile plotscale_fill_continuous(low ="#0C7BDC", high ="#DC3220", name ="Total Daily Miles Ran", breaks =c(5,10,15,20,25,30, 35, 40)) +#adding color, legend title, and legend breaksfacet_wrap(~ month_name, scales ="free_x") +#wrapping data by month namelabs(x ="Training Week", y ="Day of the Month", title ="Visualizing a 24-Week Marathon Training Plan", subtitle ="Utilizing Strava data to understand peak mileage weeks for training purposes.") +#adding labelsscale_y_reverse() +#reversing the order of the y axistheme( #adding theme with adjustments to font, size of text, and features of textplot.title =element_text(family ="sans", face ="bold", size =20),panel.grid.major.x =element_blank(),plot.subtitle =element_text(family ="sans", face ="italic", size =10),axis.title =element_text(family ="sans", size =10),axis.text.x =element_text(family ="sans", size =8),axis.text.y =element_text(family ="sans", size =8),legend.title =element_text(family ="sans", size =10) )
#Additional Wrangling ---##filtering to just runsstrava_runs <- filtered_strava_activities %>%filter(sport_type =="Run"| sport_type =="TrailRun")
#Creating Hexplot ---strava_hex <- strava_runs %>%ggplot(aes(x = total_miles, y = (moving_time_sec)/3600)) +geom_hex() +#specifying hex plot paletteer::scale_fill_paletteer_c("viridis::plasma", name ="Count of Runs", breaks =c(1,4,8,12,16,20)) +#adding color scale and legend breaks and title labs(y ="Elapsed Time (Hours)",x ="Distance (Miles)", title ="Planning For Run Durations During Marathon Training", subtitle ="Utilizing Strava data to understand rough time allocation for lengths of runs.") +#labeling plot itemstheme(panel.background =element_rect(fill ="white"), #specifying my themepanel.border =element_rect(colour ="lightgrey", fill =NA, linetype =1), #defining border layoutpanel.grid.major.x =element_line(colour ="lightgrey", linetype =2), #defining gridlinespanel.grid.major.y =element_line(colour ="lightgrey", linetype =2), #defining gridlinesplot.title =element_text(family ="Optima", face ="bold", size =17), #defining plot title layoutplot.subtitle =element_text(family ="Optima", face ="italic", size =10), #defining plot subtitle layoutaxis.title =element_text(family ="Optima", size =9, face ="bold"), #axis text titles layoutaxis.text.x =element_text(family ="Optima", size =8), #x axis text values layoutaxis.text.y =element_text(family ="Optima", size =8), #y axis text values layoutlegend.title =element_text(family ="Optima", size =9, face ="bold") #legend text layout )strava_hex +annotate( #adding annotation for the marathongeom ="text",x =25,y =4.3,label ="Marathon Day",size =3,color ="black",hjust ="inward" ) +annotate( #adding an arrow pointing to the marathon pointgeom ="curve",x =23, xend =25,y =3.8, yend =3.9,curvature =0.55,arrow =arrow(length =unit(0.3, "cm")) ) +annotate( #adding annotation for the teton crest trail (outlier)geom ="text",x =34.7,y =9,label ="Teton Crest Trail",size =3,color ="black",hjust ="inward" ) +annotate( #adding an arrow pointing to the tct pointgeom ="curve",x =35, xend =36.3,y =9, yend =9.9,curvature =0.35,arrow =arrow(length =unit(0.3, "cm")) )
Visualization # 3: Bar Plot
#Additional wrangling ---##summing total miles for each training week and sport typetotal_by_training_week <- filtered_strava_activities |>group_by(sport_type, training_week) |>summarize(total_miles_by_week =sum(total_miles))
`summarise()` has grouped output by 'sport_type'. You can override using the
`.groups` argument.
##calculating average weekly miles avg_weekly_miles =round((sum(total_by_training_week$total_miles_by_week))/24, 2)
#Creating barplot ---ggplot(total_by_training_week, aes(x =as.numeric(training_week), y = total_miles_by_week, fill =str_replace(sport_type, "TrailRun", "Trail Run"))) +#updating trail run to have a spacegeom_col() +#defining a bar plotscale_fill_paletteer_d("nord::aurora") +#specifying the color schemelabs(x ="Training Week", #adding labels to graphy ="Total Distance (in miles)", fill ="Type of Activity", title ="Evaluating Expected Total Weekly Miles for Marathon Training", subtitle ="Utilizing Strava data to understand weekly ramp up/down of running miles within a 6 month training period.") +scale_x_continuous(breaks = scales::pretty_breaks(n =10)) +#adding breaks to my x axisscale_y_continuous(breaks = scales::pretty_breaks(n =5)) +#adding breaks to my y axisgeom_hline(yintercept = avg_weekly_miles, linetype ="dashed", color ="black") +# adding a horizontal line for average weekly miles annotate( #adding annotation for the teton crest trail (outlier)geom ="text",x =0, y =48,label ="Average Weekly Miles (29.6)",size =3,color ="black",hjust ="inward" )+annotate( #adding an arrow pointing to the horizontal linegeom ="curve",x =2.5, xend =2.5,y =45, yend =30,curvature =0,arrow =arrow(length =unit(0.3, "cm")) ) +theme(panel.background =element_rect(fill ="white"), #specifying my themepanel.border =element_rect(colour ="lightgrey", fill =NA, linetype =1), #defining border layoutpanel.grid.major.y =element_line(colour ="lightgrey", linetype =2), #defining gridlinesplot.title =element_text(family ="Optima", face ="bold", size =17), #defining plot title layoutplot.subtitle =element_text(family ="Optima", face ="italic", size =10), #defining plot subtitle layoutaxis.title =element_text(family ="Optima", size =9, face ="bold"), #axis text titles layoutaxis.text.x =element_text(family ="Optima", size =8), #x axis text values layoutaxis.text.y =element_text(family ="Optima", size =8), #y axis text values layoutlegend.title =element_text(family ="Optima", size =9, face ="bold") #legend text layout )
This last plot is still a work in progress. I’m not entirely sure if I will include it on my infogrpahic just yet, but I didn’t want to lose the code for this one in the chance I decide to use a spider plot in my final infographic. This is not part of my “3 chosen plots” for this homework, I just don’t want to lose the code.
#Spider Plot - WORK IN PROGRESS## Wranglingstrava_runs2 <- filtered_strava_activities %>%group_by(month_name, sport_type) %>%summarise(n =n())
`summarise()` has grouped output by 'month_name'. You can override using the
`.groups` argument.
## Spider Plotggplot(strava_runs2, aes(x = month_name, y = n, group = sport_type, color = sport_type)) +geom_polygon(fill =NA, size =1) +geom_point(size =2, shape =21, fill ="white") +geom_text(aes(label = n), position =position_stack(vjust =0.1), size =4) +coord_polar(start =2) +labs(title ="Sport Type Comparison by Month During Training",color ="Sport Type",x =NULL,y =NULL,caption ="Source: My Strava Data") +theme_minimal()
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
#Need to add updates for labels/numbers because its currently hideous, showing full month names, prettier legend
7.What challenges did you encounter or anticipate encountering as you continue to build / iterate on your visualizations in R?
The challenges I encountered with this homework revolved mostly around wrangling the data. This was quite time consuming, but once completed, allowed me to accelerate my visualizations with a dataset that was clean and tidy. I also struggled to pick which advanced plots to use, since a lot of powerful story-telling can come from simple plots with this data (like histograms and density plots). For my final assignment, I will have to figure out which plots will best represent my data in a clear manner and continue iterating on those I have created.
8.What ggplot extension tools / packages do you need to use to build your visualizations? Are there any that we haven’t covered in class that you’ll be learning how to use for your visualizations?
I used the RStrava package to access my data, but not to build my visualization. I’m using standard ggplot tools for my visualizations.
9.What feedback do you need from the instructional team and / or your peers to ensure that your intended message is clear?
The feedback I’m hoping to receive from my peers/instructional team includes if my graphs clearly answer the questions I’m trying to answer, as well as any other plot ideas for answering this question. Since there are limited resources out there that have examples of advanced plots using Strava data, I’d love to brainstorm with other folks on other potential ideas for showcasing this data.